Fix #1145: Webhook (Teams/Slack) alerts re-fire after an app restart#1147
Merged
Conversation
#981 added restart-dedup for the email channel only; a restart cleared the two guards that suppress a webhook re-send, so reopening Lite re-posted a Teams/Slack alert already delivered before the restart (identical Dedup Key and Occurrences). Two-part fix: 1. Webhook cooldown seed (shared, BOTH apps). WebhookAlertService now seeds its per-(serverId, metricName) cooldown from alert history on first use, mirroring the email seed, via a new IAlertHistoryStore.GetLastWebhookSentUtcAsync. Lite filters notification_type IN ('webhook','email+webhook'); Dashboard filters NotificationType == "webhook". send_error is NOT filtered on -- it tracks the email channel, so an email-failed-but-webhook-sent row must still seed. Wired into the WebhookAlertService DI in both MainWindows. 2. Edge-trigger watermark persistence (Lite). The rolling-count gate's in-memory watermark (#1091) reset to 0 on restart, so the first sweep re-fired for events still in the 1-hour lookback -- and because that gap can exceed the cooldown, the seed alone (time-bounded) does not cover it. The watermark now persists to a new config_edge_trigger_watermarks DuckDB table (upsert on change), seeded before the first sweep at startup. Dashboard needs no watermark persistence: its deadlock gate re-baselines on restart (raw delta) or is 5-min-windowed (always within the cooldown the seed now covers), and blocking is level+cooldown -- none produce the byte-identical duplicate the Lite edge-trigger gate does. Tests: Lite 505 + Dashboard 487 green. New: webhook-row history filter + watermark save/load/upsert round-trips + WebhookAlertService seed-suppresses / seed-older-than-cooldown-does-not / null-store-attempts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1145.
Problem
#981 added restart-dedup for the email channel only. A restart cleared the two guards that suppress a webhook re-send, so reopening Lite re-posted a Teams/Slack alert already delivered before the restart — identical Dedup Key and Occurrences. For a webhook-only deployment there was zero restart protection.
Two independent guards must fail together: the in-memory edge-trigger watermark (#1091) resets to 0 on restart, making the gate fire; the in-memory webhook cooldown is empty, letting the post through.
Fix
1. Webhook cooldown seed — shared, BOTH apps.
WebhookAlertServicenow seeds its per-(serverId, metricName)cooldown from alert history on first use, mirroring the email seed (#981), via a newIAlertHistoryStore.GetLastWebhookSentUtcAsync. Lite filtersnotification_type IN ('webhook','email+webhook'); Dashboard filtersNotificationType == "webhook".send_erroris not filtered on — it tracks the email channel, so an email-failed-but-webhook-sent row must still seed. Wired into theWebhookAlertServiceDI in bothMainWindows.2. Edge-trigger watermark persistence — Lite.
The rolling-count gate's in-memory watermark (
_lastAlertedBlockingCount/_lastAlertedDeadlockCount) reset to 0 on restart, so the first post-restart sweep re-fired for events still in the 1-hour lookback. Because that gap can exceed the cooldown (gotqn's repro was 17 min > 15-min cooldown), the time-bounded seed alone does not cover it — only persisting the watermark does. The watermark now persists to a newconfig_edge_trigger_watermarksDuckDB table (upsert on change), seeded before the first sweep at startup.Why Dashboard needs no watermark persistence (parity rationale)
Dashboard's deadlock gate re-baselines on restart (raw delta) or is 5-minute-windowed (
FilteredDeadlockCount, always within the cooldown the seed now covers); blocking is level + cooldown (re-fires on its normal cadence, now bounded by the seeded cooldown). None produces the byte-identical duplicate the Lite edge-trigger gate does. The webhook cooldown seed (part 1) is the shared half and applies to both apps.Tests
GetLastWebhookSentUtc_FiltersToWebhookRows_IncludingEmailWebhook); watermark save/load/upsert round-trip (EdgeTriggerWatermark_SaveLoad_RoundTripsAndUpserts);WebhookAlertServiceseed-suppresses-within-window / seed-older-than-cooldown-does-not-suppress / null-store-attempts.🤖 Generated with Claude Code